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Commissioner for Patents 
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Alexandria. Virginia 22313-H50 

I. the undersigned inventor of the above referenced application, having been admonidied 
that willfiJ false statements and the like are punishable by fine, imprisonment, or both (18 U.S C. 
§ 1001) and may jeopardise the validity of the application or any patent issuing thereon, declare as 
follows 

I am an inventor of the aliove captioned patent application [the Application]. As Indicated 
to the document attached hereto as Exhibit "A" entitled Disclosvrti AUS8'J999-N&'i (Method 
and System for Attdio File Searching Using Voice / Text Keys) [the Disclosure}, my co-inventors 
and 1 conceived of system for receiving a text-based input, converting the text input to a 
corresportding diphthong sequence, encoding the diphthong sequence, and using the encoded 
diphthong sequence to search and compare encodings of diphthong sequences taken from the 
audio content of a storage d^ice such as a CD. As described in the Disclosure, the invention was 
woricable at least as early as September 28. 1998. The invention was documented via the 
Disclosure on or about September 30, 1998. submitted to o patent review committee, sent to 
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Cornwhslonerfor ^utcnls Sttittf Nu. 0W4<)S,23^ 

Stx'fiort L A^davh An Unir 2654 

Page 2 oj2 Examiner. A. Annstrons 

DockjFV A USV90879VSI 

outside counsel oo or nbour November U), and uUiaiatcly drafted and Tiled ius the curnenily 
pending patent a|)plic«lion on Fethmao' 3. 2000. 

I further declare ihat aJl siaieinenis mnde of my own kjiowlcdgc iire U*uc and all 
suitemcnu auide on inCormatioo and belief are bcli».wcd to be irue. 



Jctson Bnunij;uitnej: 



Date )0/l'^/lOO<i 



N adeem Msdik Date 



Steven Roberts Ds\\& 
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, Serial No^09/49B,23^ ' 
Art Unrf: 2654 \ 



Exmtmer. A- Armstrong 
Docket: A US99(m9USJ 



outside counsd on or about November 19. 1999 and ultimately drafted and filed as the cunrcntty 
pending patent application on Fdaruary 3, 2000. 

I further declare? that all sutetnents made of my own knowledge are true and all statements 
made on information and belief are believed to be true. 



fason Baumgartner 





Steven Roberts 



Date 
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Commissioner for Patent, Serial No. 09/49i?M^ 
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Pa 2of2 ExamiFter. A . A rmMmng 

Docket: A VS990H79US I 

outside counsel on or about November 19, 1999 and ultinnately drafted and filed a$ the currently 
pending paient application on February 3, 2000. 

I further declare that alt statements made of my own knowledge are true and all 
statements made on information and belief are believed to be true. 



Jason Baumgailner " * Date 



Nadeem Malik Date 




Steven Roberts Date 
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Summary 



n^qurrad fiel^ me mar katf with tha astsr^ (*) and must be tiOdd m to complete the form . 



Status 


Under Evaluation 


Processina Location 


AUS 


Furx^tior^al Area 


07 - STQ * CORPORATE SOFTWARE TECH. (C. LOOAH) 


Attorney/Patent 
Professional 


MarK McBumey/Aust:n/]BM 


IDT Team 


Tim Dietz/Auatln/IBM; Nadosm Maitk/AuBttn/lBM 


Submitted Date 


11/05/99 10:31:38 PM 


Owning Division 


CHQ 


PVT Score 


To calculate a PVT score, use the 'Calculate PVT button. 


Lab 




Technology Code 




Incentive Program 


(tNC4) PC Sender and Consumer Products 



Inventors with Lotus Notes IDs 

Inventors: Steven Roberts/Austin/iBM. Jaean Beumgaftner/AU3tin/18M. Nadeem Mallk/Austln/IBM 



Inventor Navne 

» denoOe prtmery contact 



^:Ba'timoaffjliie^>'iia5Qn- ' r- • 



Invontor 

Sflrial DIv/Dept 



Manager 

Serial Mana^Name 



•ijb9ao^--Oar6|! 



Inventors without Lotus Notes IDs 
IDT Selection 



Tim btet»?it#nrtflfc*^* 
Itoiaemi^AvstindBN^v 



ResiMinse^Diie to tP&L : 1 2/08/99 



Main Idea 



[AAh^y^^aMtlRrdfaaato : 



*Tttfeof aiscic^^ 

Method and System for Audio Rle Searching Using Voice / Text Keys 



••Idea of xilSGlGisur&;- \ v-^ ^\r. 'i^' : = 
1 . Describe your invention, stating the problem solved pf appropriate), and indicating the advantages of 
using ths invention. 

This invention discloses a system for allowing the searching of audio files for vocal segments e.g.. for a certain word 
or sequence of words. The search basis can be preserited to the system in the form of a spoken word or typed equivs 
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For Instance, instead o1 fast forwarding to a portion of your favorite song on a compact disc or sefiment of 
a movie (with pertiaps many backward-and-fonvard iterations), suoh a scheme would allow the user to 
speak (or type) the desired phrase (e.g.. a song lyric, or line from the movie) and have the media player 
find the location of the desired Hem, Disclosed is a scheme that itlustrates how such a system may be 
implemented. 

Such a scheme has many advantages over existing state-of-the-art media devices, which require the 
user to reactively respond to the media content flashing across a television screen or an audio stream 
played at a such a rate that it is almost unintelligible (namely, manual high-speed fast-forward & rewind 
searching). This is a time consuming and somewhat annoying process. Using this disclosure, we can shift 
this tedtous manual effort to the media device by building auffldent Intelligence to allow the user to 
spedfy what they are looking for, and having the media device do the necessary scanning to find 
matches* Thus the user is thereby only presentaJ with similar matches, and does not need to sperwl 
his/her own time interactively with the media devfce to find such matchre. Such a search may further 
typically be carried out in a fast manner, sbice for example a 40x CD reader can parse audio from a CD 
at 40x the normal rate. A computer may further scan the high*speed anatog signals from an analog video 
or audio media with higher precision than a human being* allowing for scanning of all forms of media 
(analog or digitaO at higher speecte than achievable by a human, and without the annoying back-and-forth 
analysis typically suffered via manual arvl error-prone scanning. 

2, How does the invention solve the problem or achieve an advantage,(a description of "the invention", 
jncluding figures inline as appropriate)? 

This inventton pertains to the use of a high-performance audio speech intarpretatton system to allow for 
automated scanning of audio files for speech patterns (words or sequences of words). The user inputs 
the search key - either by speaking the fragment to be searched for, or by entering text of said fragment. 
The system converts this key to a series of diphthong (a primitive constmct of the language of the audio 
file) using existing voice recognition technotogy ~ these are represented as a ^ring of symbols. The 
system next begins parsing the audio file this may be a digital file (e.g., a .wav file, a CD. or the audio 
track of a DVD), or an anak^g medium (e.g., an audio tape, or the audio track of a VCR tape) through an 
analog-to-digrtal converter. The media is transformed into a sequence of diphthongs -- again a string of 
symbols. As such, our symbolic audio file search is reduced to a string pattern matching. 



A flowchart for this scheme is depicted in the figure below. First, the search pattern (entered manually by 
speech or 

text) is encoded as a sequence of diphthongs. Next, the audio file to be searched is translated on-the-fly 
into a sequence of diphthongs, The "match" box utilizes standard pattern nruitching algorithms to took for 
instances of the search sequence within the file sequence. If a match is found, it is reported to the user 
(by playing the media from the point of the match). If this is not the correct match, the user may opt to 
"find next instance". 
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An attemate embodiment of this disclosure wouW also allow for "fuzzy pattern matching", which is useful fdrthe foltowi 
reasons: 1 ) to desensitiza the system to varients in spsedi-to-diphthong technology; 

2) to allow the user use partial phrases. 
Algorithms that do fuzzy pattern matches are in common use today (e.g.. ''^gestlon** generators for spelling checKen 
As the technology advances, more exact matches will result. This idea allows the development of the 
system without exposure to the exact behavior of the speech to diphthong technology. 

An embodiment allowing partial phrases as input would permit the user to spe^fy wildcards to further narrow down the 
fuzzy search. For example, assume that the user is really loddng for the quote "All work and no play makes Jack a dul 
but does not recall the exact pattern, only that it starts with "All work" and ends with "dull boy". They can specify a wild 
b^een these fragments (to avoid too many matches, if either fragment was searched context-freely). The 
algorithm can optionally use some heuristics (or usef-specit'ied parameters) to limit the depth of the wildcard - i.e.. it is 
unlikely that there will be more than a few dozen diphthongs between the known fragments, and this parameter can a^. 
a match with 30 minutes of text between the fragments. 

Our proposed solution to the implementation of this invention involves the use of a fast speech recognizor 
(e.g., a speech pattem-to^iphthong converter). Optionally, if text-based search pattern specification is to 
be carried out, a text-to-diphthong converter will be needed. The latter two are relatively straight-fonArard; 
speech recognition software exists today, and may be used for recognition of diphthongs. Similarly, a 
heuristb text-to-diphthong converter, which may be made exact by a dictionary file, can be employed 
using similar technology to those employed by "phone in and have your email read to you" systems. 

This disclosed system has several advantages and applications. First is for home use in multimedia 
devices -- this scheme may greatly reduce the amount of time and manual error-prone effort involved in 
fast-forward and rewind based searching for scenes of a movie or parts of musical works. Second is in 
more technical fields, which may be used to search long narratives, interviews, proceedings, or 
sun^eillance files for exact phrases. 

Note that such a scheme may be much faster than a human. Even barring the enor-prone back-arKi*forth 
nan^owing down of the target, a human can only comprehend up to a certain speed - for example, if 
listening to an audfo file, a human may only comprehend speech up to a factor of 8 or so. A computer 
may parse audio much faster » either from a 

digital source (e g,, a 40x CD ROM reader), or from an analog source (e.g.. playback of an audio tape at 
high speed), and extract diphthongs from these high-speed sources. Such a scheme could also be 
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enhanced to be able to direcUy parse compressed audio files (e.g.. MPS). 

3. If the same advantage or problem has been identified by others {inside/outside IBM), how have those 
others solved it and does your solution dilfer and why is it better? 

Other companies have successfully martceted this concept in forms like personal address assistants that 
will look up a person's phone number if you speak out the person s name. Our solution is more robust 
and can handle much larger jobs because of the flexibility gained by performing the diphthong analysis. 
Thus, if the dictionary can be searched in some logical fashion without being expanded, it is possible to 
apply the technique directly to compressed audio streams. 

For instance, by modifying our example algorithm it is possible to find a diphthong stream in an MPS fPe. 
In other words, this technique is superior because it works in an Interm^late format that takes advantage 
of scaleable media whereas the existing techniques perform simple correlattons on speech patterns over 
a time interval, ^ . 

4, If the invention is implemented in a product or prototype, include technical details, purpose, disclosure 
details to others and the date of that implementation. 
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